Memory Access Patterns on Architectures with Local Memory

نویسندگان

Jianbin Fang

Ana Lucia Varbanescu

چکیده

Nowadays architectures and their programming model implementations are becoming increasingly complex and diverse, making the performance benefits of using local memory unpredictable via only simplistic modeling. In this paper, we present a benchmark-based approach to tackle this issue. We first present a two-part approach to describe memory access patterns for many-thread applications. For each MAP, we design benchmarks of native versions (without local memory) and optimized versions (using local memory). Then we evaluate them on typically used platforms (NVIDIA GTX280, NVIDIA GTX580, AMD HD6970, and Intel E5620), compare the performance of native versions versus optimized versions, and get a performance database. This database can provide essential information for automated usage of local memory.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

FPGA Implementation of a Hammerstein Based Digital Predistorter for Linearizing RF Power Amplifiers with Memory Effects

Power amplifiers (PAs) are inherently nonlinear elements and digital predistortion is a highly cost-effective approach to linearize them. Although most existing architectures assume that the PA has a memoryless nonlinearity, memory effects of the PAs in many applications ,such as wideband code-division multiple access (WCDMA) or orthogonal frequency-division multiplexing (OFDM), can no longer b...

متن کامل

Aristotle: A performance impact indicator for the OpenCL kernels using local memory

Due to the increasing complexity of multi/manycore architectures (with their mix of caches and scratch-pad memories) and applications (with different memory access patterns), the performance of many workloads becomes increasingly variable. In this work, we address one of the main causes for this performance variability: the efficiency of the memory system. Specifically, based on an empirical ev...

متن کامل

Kokkos: Enabling manycore performance portability through polymorphic memory access patterns

The manycore revolution can be characterized by increasing thread counts, decreasing memory per thread, and diversity of continually evolvingmanycore architectures. High performance computing (HPC) applications and librariesmust exploit increasingly finer levels of parallelismwithin their codes to sustain scalability on these devices. A major obstacle to performance portability is the diverse a...

متن کامل

Design and Evaluation of Data Access Prediction Strategies in SDSM Systems

Software Distributed Shared Memory (SDSM) systems provide the shared memory abstraction on top of a message passing hardware, simplifying application programming in these architectures. However, some memory references exhibit long latencies due to remotely cached data. In order to hide this latency, many techniques that propagate data speculatively were developed. This requires that the data ac...

متن کامل

Memory Latency in Distributed Shared-Memory Multiprocessors

Analytical models were developed and simulations of memory latency were performed for Uniform Memory Access (UMA), Non-Uniform Memory Access (NUMA), Local-Remote-Global (LRG), and Replicated Concurrent-Read ( R C R ) architectures for hit rates from 0.1 to 0.9 in steps of 0.1, memory access times of 10 nsec to 100 nsec, proportions of read/write access from 0.01 to 0.1, and block sizes of 8 to ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Memory Access Patterns on Architectures with Local Memory

نویسندگان

چکیده

منابع مشابه

FPGA Implementation of a Hammerstein Based Digital Predistorter for Linearizing RF Power Amplifiers with Memory Effects

Aristotle: A performance impact indicator for the OpenCL kernels using local memory

Kokkos: Enabling manycore performance portability through polymorphic memory access patterns

Design and Evaluation of Data Access Prediction Strategies in SDSM Systems

Memory Latency in Distributed Shared-Memory Multiprocessors

عنوان ژورنال:

اشتراک گذاری